可重新配置的智能表面(RIS)可以显着增强TERA-HERTZ大量多输入多输出(MIMO)通信系统的服务覆盖范围。但是,获得有限的飞行员和反馈信号开销的准确高维通道状态信息(CSI)具有挑战性,从而严重降低了常规空间分裂多次访问的性能。为了提高针对CSI缺陷的鲁棒性,本文提出了针对RIS辅助TERA-HERTZ多用户MIMO系统的基于深度学习的(DL)基于速率的多访问(RSMA)方案。具体而言,我们首先提出了基于DL的混合数据模型驱动的RSMA预编码方案,包括RIS的被动预编码以及模拟主动编码和基本站(BS)的RSMA数字活动预码。为了实现RIS的被动预码,我们提出了一个基于变压器的数据驱动的RIS反射网络(RRN)。至于BS的模拟主动编码,我们提出了一个基于匹配器的模拟预编码方案,因为BS和RIS采用了Los-Mimo天线阵列结构。至于BS的RSMA数字活动预码,我们提出了一个低复杂性近似加权的最小均方误差(AWMMSE)数字编码方案。此外,为了更好地编码性能以及较低的计算复杂性,模型驱动的深层展开的主动编码网络(DFAPN)也是通过将所提出的AWMMSE方案与DL相结合的。然后,为了在BS处获得准确的CSI,以实现提高光谱效率的RSMA预编码方案,我们提出了一个CSI采集网络(CAN),具有低飞行员和反馈信号开销,下行链接飞行员的传输,CSI在此处使用CSI的CSI反馈。 (UES)和BS处的CSI重建被建模为基于变压器的端到端神经网络。
translated by 谷歌翻译
原型零件网络(Protopnet)引起了广泛的关注,并增加了许多随访研究,因为它的自我解释特性可解释人工智能(XAI)。但是,当直接在视觉变压器(VIT)骨架上应用原始网络时,学到的原型存在“分心”问题:它们具有相对较高的可能性,即被背景激活,并且对前景的关注较少。建模长期依赖性的强大能力使得基于变压器的Protopnet难以专注于原型部分,从而严重损害了其固有的解释性。本文提出了原型零件变压器(ProtoPformer),以适当有效地应用基于原型的方法,并使用VIT进行可解释的图像识别。提出的方法介绍了根据VIT的建筑特征捕获和突出目标的代表性整体和部分特征的全局和局部原型。采用了全球原型,以提供对象的全球视图,以指导本地原型集中在前景上,同时消除背景的影响。之后,明确监督局部原型,以专注于它们各自的原型视觉部分,从而提高整体可解释性。广泛的实验表明,我们提出的全球和本地原型可以相互纠正并共同做出最终决策,这些决策分别忠实,透明地从整体和地方的角度缔合过程。此外,ProtoPformer始终取得优于基于原型的原型基线(SOTA)的卓越性能和可视化结果。我们的代码已在https://github.com/zju-vipa/protopformer上发布。
translated by 谷歌翻译
需求估计在动态定价中起着重要的作用,在动态定价中,可以通过基于需求曲线最大化收入来获得最佳价格。在在线酒店预订平台中,房间的需求或占用率随着房间类型而变化,随着时间的推移变化,因此获得准确的占用估算是一项挑战。在本文中,我们提出了一种新颖的酒店需求功能,该功能明确地模拟了对占用预测需求需求的价格弹性,并设计了价格弹性预测模型,以了解各种影响因素的动态价格弹性系数。我们的模型由精心设计的弹性学习模块组成,以减轻内生性问题,并在多任务框架中接受培训以解决数据稀疏性。我们在现实世界数据集上进行了全面的实验,并验证方法优于最先进的基准,以实现占用预测和动态定价。
translated by 谷歌翻译
最近,卷积增强的变压器(构象异构体)在自动语音识别(ASR)中显示出令人鼓舞的结果,表现优于先前发表的最佳变压器传感器。在这项工作中,我们认为编码器和解码器中每个块的输出信息并不完全包容,换句话说,它们的输出信息可能是互补的。我们研究如何以参数效率的方式利用每个块的互补信息,并且可以预期这可能会导致更强的性能。因此,我们提出了刻板的变压器以进行语音识别,名为BlockFormer。我们已经实现了两个块集合方法:块输出的基本加权总和(基本WSBO),以及挤压和激气模块到块输出的加权总和(SE-WSBO)。实验已经证明,阻滞剂在Aishell-1上大大优于基于最新的构象模型,我们的模型在不使用语言模型的情况下达到了4.35 \%的CER,并且在4.10 \%上具有外部语言模型的4.10 \%测试集。
translated by 谷歌翻译
假新闻的广泛传播越来越威胁到个人和社会。在单个领域(例如政治)上自动假新闻发现已做出了巨大的努力。但是,相关性通常存在于多个新闻领域,因此有望同时检测多个域的假新闻。基于我们的分析,我们在多域假新闻检测中提出了两个挑战:1)域转移,是由域,情感,样式等领域之间的差异引起的。世界分类仅输出一个单个领域标签,而不管新闻文章的主题多样性如何。在本文中,我们提出了一个记忆引导的多视图多域假新闻检测框架(M $^3 $ fend),以应对这两个挑战。我们从多视图的角度对新闻作品进行建模,包括语义,情感和风格。具体而言,我们建议一个域存储库来丰富域信息,该信息可以根据可见的新闻和模型域特征来发现潜在的域标签。然后,以丰富的域信息为输入,域适配器可以从各个域中的新闻的多个视图中适应汇总歧视性信息。对英语和中文数据集进行的大量离线实验证明了M $^3 $ fend的有效性,在线测试在实践中验证了其优势。我们的代码可在https://github.com/ictmcg/m3fend上找到。
translated by 谷歌翻译
在空中杂种大规模多输入多输出(MIMO)和正交频施加多路复用(OFDM)系统中,如何设计具有有限的飞行员和反馈开销的光谱效率宽带多用户混合波束,这是具有挑战性的。为此,通过将关键传输模块建模为端到端(E2E)神经网络,本文提出了一个数据驱动的深度学习(DL)基于时间划分双工(TDD)的基于数据驱动的深度学习(DL)的统一混合边际框架和具有隐式通道状态信息(CSI)的频分隔双链(FDD)系统。对于TDD系统,提出的基于DL的方法共同对上行链路飞行员组合和下行链路混合光束模块作为E2E神经网络。在FDD系统中,我们将下行链路飞行员传输,上行链路CSI反馈和下行链路混合光束形成模块作为E2E神经网络建模。与分别处理不同模块的常规方法不同,提出的解决方案同时以总和速率作为优化对象优化了所有模块。因此,通过感知空对地面大规模MIMO-OFDM通道样本的固有属性,基于DL的E2E神经网络可以建立从通道到波束形式的映射函数,以便可以避免使用显式通道重建,以减少飞行员和反馈开销。此外,实用的低分辨率相变(PSS)引入了量化约束,从而导致训练神经网络时棘手的梯度反向传播。为了减轻阶段量化误差引起的性能损失,我们采用转移学习策略,以基于假定理想的无限分辨率PSS的预训练网络来进一步调整E2E神经网络。数值结果表明,我们的基于DL的方案比最先进的方案具有相当大的优势。
translated by 谷歌翻译
Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition. In this paper, we investigate the problem of scene text spotting, which aims at simultaneous text detection and recognition in natural images. An end-to-end trainable neural network model for scene text spotting is proposed. The proposed model, named as Mask TextSpotter, is inspired by the newly published work Mask R-CNN. Different from previous methods that also accomplish text spotting with end-to-end trainable deep neural networks, Mask TextSpotter takes advantage of simple and smooth end-to-end learning procedure, in which precise text detection and recognition are acquired via semantic segmentation. Moreover, it is superior to previous methods in handling text instances of irregular shapes, for example, curved text. Experiments on ICDAR2013, ICDAR2015 and Total-Text demonstrate that the proposed method achieves state-of-the-art results in both scene text detection and end-to-end text recognition tasks.
translated by 谷歌翻译
Although recent deep learning methods, especially generative models, have shown good performance in fast magnetic resonance imaging, there is still much room for improvement in high-dimensional generation. Considering that internal dimensions in score-based generative models have a critical impact on estimating the gradient of the data distribution, we present a new idea, low-rank tensor assisted k-space generative model (LR-KGM), for parallel imaging reconstruction. This means that we transform original prior information into high-dimensional prior information for learning. More specifically, the multi-channel data is constructed into a large Hankel matrix and the matrix is subsequently folded into tensor for prior learning. In the testing phase, the low-rank rotation strategy is utilized to impose low-rank constraints on tensor output of the generative network. Furthermore, we alternately use traditional generative iterations and low-rank high-dimensional tensor iterations for reconstruction. Experimental comparisons with the state-of-the-arts demonstrated that the proposed LR-KGM method achieved better performance.
translated by 谷歌翻译
The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals. In this work, we harness these traits and present a unified multimodal generation model that can conduct both the "modality translation" and "multi-modality generation" tasks using a single model, performing text-based, image-based, and even vision-language simultaneous generation. Specifically, we unify the discrete diffusion process for multimodal signals by proposing a unified transition matrix. Moreover, we design a mutual attention module with fused embedding layer and a unified objective function to emphasise the inter-modal linkages, which are vital for multi-modality generation. Extensive experiments indicate that our proposed method can perform comparably to the state-of-the-art solutions in various generation tasks.
translated by 谷歌翻译
Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause a performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.
translated by 谷歌翻译